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Abstract. In formal language theory, James Rogers published a series of innovative 
papers generalising strings and trees to higher dimensions.Motivated by applications in 
linguistics, his goal was to smoothly extend the core theory of the formal languages of 
strings and trees to these higher dimensions. 

Rogers' definitions focussed on a specific representation of higher dimensional trees. 
This paper presents an alternative approach which focusses more on their universal prop- 
erties and is based upon category theory, algebras, coalgebras and containers. Our ap- 
proach reveals that Rogers' trees are canonical constructions which are also particularly 
beautiful. We also provide new theoretical results concerning higher dimensional trees. 
Finally, we provide evidence for our devout conviction that clean mathematical theories 
provide the basis for clean implementations by showing how our abstract presentation 
makes computing with higher dimensional trees easier. 



1 Introduction 

Strings occur in the study of formal languages where they are used to define complexity 
classes such as those of regular expressions, context free languages, context sensitive lan- 
guages etc. Trees also play a multitude of different roles and are often thought of as 2- 
dimensional strings. For instance, there is a clear and well defined theory of tree automata, of 
tree transducers and other analogues of string-theoretic notions [6]. Indeed, the recent interest 
in XML and its focus on 2-dimensional data has brought the formal language theory of trees 
to a wider audience. 

In a series of innovative papers (see [HJ and references therein), James Rogers asked how 
one can formalise, and hence extend, the idea that trees are two-dimensional strings to higher 
dimensions. The desire to go up a dimension is very natural - for example a parser will 
turn a string into a tree. Thus higher dimensional trees will certainly arise when parsing 
2-dimensional trees and, more generally, when trees are considered not as part of the meta- 
theory of the formal languages of strings, but as objects worthy of their own study. Rogers 
came from a background in both formal languages and natural languages and his motivation 
to study higher dimensional trees was rooted in the use of the latter to study the former. For 
example, his paper discusses applications to Tree Adjoining Grammars, Government Binding 
Theory, and Generalised Phrase Structure Grammars. 

Rogers' work was highly imaginative and he certainly had great success in generalising for- 
mal language theory from strings and trees to higher dimensions. However, his approach to 
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higher dimensional trees is very concrete and this makes his work notationally more cum- 
bersome than one might prefer. For example, Rogers defines a tree as a tree domain, ie a set 
of paths satisfying the left-sibhng and ancestor properties. Similarly, he defines higher di- 
mensional trees to be sets of higher dimensional paths satisfying higher dimensional versions 
of the ancestor and left-sibling properties. These conditions are notationally quite cumber- 
some at the two dimensional level and this complexity is magnified at higher dimensions. 
This has practical consequences as it is our belief that clean mathematical foundations are 
required for clean implementations of both higher dimensional trees as data structures and the 
algorithms which manipulate them. In particular, implementing higher dimensional trees as 
higher dimensional tree domains involves the (potential) requirement to regularly verify that 
algorithms preserve the well-formedness condition of the set of higher dimensional paths in a 
higher dimensional tree domain. 

We provide a more abstract treatment of higher dimensional trees where the fundamental con- 
cept is not the path structure of tree domains but rather the notion of fixed point and initial 
algebra. When viewed through this categorical prism, Rogers' definitions and constructions 
become very succinct and elegant. This is a tribute to both the sophistication of category the- 
ory in capturing high level structure and also to Rogers' insight in recognising these structures 
as being of fundamental mathematical and computational interest. The overall contributions 
of this paper are thus as follows: 

- We provide a categorical reformulation of the definition of Rogers higher dimensional 
trees. Remarkably, the central construction in our reformulation is the hitherto unused 
quadrant of the space whose other members are the free monad, the completely iterative 
monad, and the cofree comonad. 

- To demonstrate that this research has both practical as well as theoretical insight, we use 
this reformulation to show that classical results of Arbib and Manes on 'Machines in a 
category' apply to higher-dimensional automata. In particular, this gives procedures of 
determinisation and minimisation. 

- In a similar vein, we show that while clearly being comonadic, higher dimensional trees 
are also monadic in nature. This is an example of the kind of result that is both funda- 
mental and would be missed without the abstract categorical formulation. 

- We justify our belief that clean mathematical foundations leads to a clean computation 
structure by implementing higher dimensional trees in the Haskell programming lan- 
guage. 

Our intention with this research is to try to synthesise our abstract approach with the intuitions 
and applications of Rogers. This paper is just the beginning and we welcome feedback from 
our own community before involving people who work higher dimensional trees and natural 
languages. Connecting category theory, especially algebras and coalgebras, with other scien- 
tific disciplines is an important and valuable goal if our ideas are to spread and we are also 
to be open to influence from those outside of our field. To summarise what this paper of- 
fers, we beUeve that our use of category theory tames the apparent complexity which Rogers' 
definitions possess at first sight. 

The paper is structured as follows. Section 2 follows parts of Rogers [11] and presents his 
notions of higher-dimensional trees and automata. Section 3 presents our reformulation of 
Rogers' notions using fixed-point equations and coalgebras. Section 4 shows that Rogers' 
higher dimensional trees are examples of containers which allows us to deduce several useful 



meta-theoretic results needed later. Section 5 defines a notion of deterministic higher dimen- 
sional automaton and shows that the classical theorem of determinisation and minimisation 
from automata theory hold. 
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2 Rogers's Higher Dimensional Trees 

The most pervasive definition of (finitely branching) trees is via the notion of a tree domain. A 
tree domain is an enumeration of the paths in a tree - since a path is a list of natural numbers, 
a tree domain is a subset of lists of natural numbers. However, there should be two conditions 
on sets of paths reflecting the fact that i) if a node has an n+l'th child, then there should be an 
n'th child; and ii) aU nodes apart from the root have a parent. Thus tree domains are defined 
as foUows 

Definition 2.1 (Tree Domains). A tree domain T C N* is a subset of lists of natural numbers 
such that 

- (LS): Ifw.{n+l) £ T, then w.n € T 

- (A): If w.n e T, then w eT 

We use . for the concatenation of a list with an element. We call the first condition the left- 
sibling property (LS) and we call the second condition the ancestor property (A). Notice how 
tree domains, by focusing on paths, will inevitably lead to a process of computation dominated 
by the creation and consumption of sets of paths satisfying (LS) and (A). As we shall see later, 
tree domains and the paths in them can be treated more abstractly, and in a cleaner fashion, 
by the shapes and positions of the container reformulation of tree domains. 

However, for now, we want to ask ourselves how the tree domains given above can be gener- 
alised from being 2-dimensional structures to n-dimensional structures. In the 2-dimensional 
case we had a notion of path as a hst of natural numbers and then a tree domain consisted 
of a set of paths satisfying the properties (LS) and (A). Rogers defines n-dimensional tree 
domains by first defining what an n-dimensional path is and then defining an n-dimensional 
tree domain to be a set of n-dimensional paths satisfying higher dimensional variants of (LS) 
and (A). So what is an n-dimensional path? Notice that a natural number is a Ust of Is and 
hence a list of natural numbers is a Ust of Usts of Is. Thus 

Definition 2.2 (Higlier Dimensional Paths [11, Def 2.1]). The n-dimensional paths form a 
N-indexed set P with Pq = 1 (the one element set) and with Pn+i defined to be the least set 

satisfying 

- [] e Pn+l 

- If[xi, ..,Xjn] e -Pn+l andx G Pn, then [xi, ..,Xjn-,x\ G P„+i 

A simpler definition would be that P„ = List"l but we wanted to give Roger's definition 
to highlight its concreteness. Having defined the n-dimensional paths we can define the n- 
dimensional tree domains as follows 



Definition 2.3 (ffigher Dimensional Tree domains [11, Def 2.2]). Let To = {0, 1}. The set 

Tni-i ofn + 1-dimensional tree domains consists of those subsets T C P„+i such that 

- (HDLS): Ifs e Pn+i, then {w G Pn\s.w GT}eTn 

- (HDA): Ifs.w e T, then sgT 

The first condition is the higher dimensional left sibling property (HDLS). It is slightly tricky 
as, in higher dimensions, there is no unique left sibling and so one cannot simply say that if 
a node has an n + I'th child then the node has an n'th child. (HDLS) solves this problem 
by saying the immediate children of a node in an n + 1-dimensional tree domain form an 
n-dimensional tree domain. In the two dimensional case, (HDLS) is thus the requirement that 
the children of a node in a tree form a Ust. (HDA) is a straightforward generalisation of the 
2-dimensional ancestor property (A). The reader may wish to check that a one dimensional 
tree domain is a set of lists over 1 closed under prefixes, that is, Ti is bijective to List(l). 
There are two zero dimensional tree domains which correspond to the empty tree and to the 
tree which just contains one node and no children. 

The notion of automata is central in formal language theory and generalises to higher dimen- 
sions in a straightforward way. Firstly, we must extend tree domains so that higher dimen- 
sional trees can actually store data - this is done by associating to each path in a tree domain, 
a piece of data to be stored there. 

Definition 2.4 (Labelled tree domains [11, Def 2.3]). A S-lahelled tree domain is a map- 
ping T E, where T is a tree domain and S a set ( called the alphabet). We denote the set 
of n-dimensional S-labelled tree domains by Tn{S). 

Definition 2.5 (n- Automaton [11, Def 2.9]). An (n -|- l)-dimensional automaton over an 

alphabet S and a finite set of states Q is a finite set of triples {a, q, T) where a G E, q G Q 
and T is a Q-labelled tree domain of dimension n. 

Rogers goes on to define when an (n-n l)-automaton licenses (or accepts) an n -|- 1-dimensional 
tree as follows. A (i7-labelled) local tree is an element of U x T„(X'). An (n+l)-dimensional 
grammar over 17 is a finite subset of 17 x T„(i7), ie a finite set of local trees. An element 
A : T ^ r in Tn+i{E) is Ucensed by a grammar if for all s S T, the pair (A(s) , A' : T' T) 
is in the grammar, where T' = {w\s.w € T} and \'{w) = \{s.w). In other words, a tree 
is licensed by a grammar if it is constructed from the local trees of the grammar. Note that, 
forgetting the alphabet E, an automaton can be seen as a grammar over Q. An element in 
Tn+i{E) is now licensed by an automaton if it is an image of a Q-labeUed tree licensed by 
the grammar in which the the label of the root of each local tree has been replaced with a 
symbol in E associated with that local tree in the automaton 1 11]. 

We will see in Section 5 that acceptance is more easily defined via the unique morphism from 
the initial algebra of trees. For coalgebraists let us note here already that automata are coal- 
gebras. First, the notion of labelling means that n-dimensional tree domains form a functor 
Tn : Set Set. In particular To{X) = 1 + X and Ti{X) = List(X). Now, an n+l- 
dimensional automata over E is just a finite set Q and a function Q — > V{E x T„((5)). 
Automata and their accepted languages will be discussed in detail in Section 5, but let us look 
at two familiar examples already. 

Example 2.6. A 1 -automaton is essentially the standard notion of a non-deterministic string 
automata — that is a function Q V{E x (1 -|- Q)) where each state can perform a E- 
transition and either terminate or arrive at another state. 



Example 2.7. A 2-automaton is a coalgebra 5 : Q ^ V{E x Ti{Q)), that is, a relation 
S C Q X {E X List((5)) which can be understood as a non-deterministic tree automata (see 
eg [6]): Given a state q and a tree <T{ti, . . .tn) the automaton tries to recognise the tree by 
guessing a triple (g, ct, [gi, . . . g„]) € 5 and continuing this procedure in the states with 
trees ti. Whereas this coalgebraic definition has a top-down flavour, the accepted language is 
most easily defined in an algebraic (bottom-up) fashion as follows. The relation 5 gives rise 
to a set of Q-labelled terms (or bottom-up computations) C via 



where qa € C means the automata recognises the cr-labelled tree starting from the state q. 
One then defines, wrt a set of accepting states Qo, that the automaton accepts a tree t iff 
qt eC and q G Qq. 

3 Higher Dimensional Trees, Algebraically 

Despite being a natural generahsation of a 2-dimensional tree domain to an n-dimensional 
tree domain, Definition 2.3 is very concrete. For example, formalising the notion of licens- 
ing (following Definition 2.5 above) is tedious. We will show that a more abstract approach 
to the definition of tree domains is possible. In particular, the 1-dimensional tree domains 
are just the usual lists while the non-empty two-dimensional tree domains are known in the 
functional programming community as rose trees with a simple syntax and semantics. That 
is, categorically one may define Rose X = fjY.X x List Y and derive from this the equaUy 
simple HaskeU implementation 

data Rose a = Node a [Rose a] 

What is really pleasant about this categorical/functional programming presentation of tree 
domains is that initial algebra semantics provides powerful methods for writing and reasoning 
about programs. In particular, it replaces fascination with the detailed representation of the 
structure of paths and the (LS) and (A) properties with the more abstract universal property 
of being an initial algebra. That is not to say paths are not important, just that they ought to be 
(in our opinion) a derived concept. Indeed, we show later in Theorem 4.6 how to derive the 
path algebra from the initial algebra semantics. 

The natural question is whether we can give an initial algebra semantics for higher dimen- 
sional trees. The answer is not just yes, but yes in a surprisingly beautiful and elegant manner. 
As remarked earher, the immediate children of a node in an (n -|- l)-dimensional tree should 
form an n-dimensional tree. This is formalised in 

Definition 3.1. Define a family of functors by 



Note that we intend Rn+iX to be the set of non-empty n + 1-dimensional X-labelled tree 
domains while Tn\.\X is intended to be the set of empty or non-empty n -|- 1-dimensional 
X-labeUed tree domains. Thus Rn\-iX should consist of an element of X to be stored at 
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qa{ti,. ..tn)eC 



R-iX = 0 

Rn+lX = fxY.X X TnY 



TnX = l + RnX (n>-l) 



the root of the tree and a potentially empty n-dimensional tree domain labelled with further 
tree domains. While one could start indexing at 0 by defining RqX = X, there is no harm in 
starting one step before with the definition of — 1-dimensional trees. As expected, calculations 
show that 



n 


R„X 


TnX 


-1 


0 


1 


0 


X 


1 + X 


1 


List+(X) 


List(X) 


2 


Rose(X) 


1 + Rose(X) 



where List+(X) are the non-empty Usts over X. 

In fact, one can go further and not just define a sequence of functors i?„ and T„, but a higher 
order functor which maps a functor F to the functor sending X to /lY.X x FY. We find this 
particularly interesting for both theoretical and practical reasons. At the theoretical level, we 
note that this construction of a functor from a functor is the final piece of the jigsaw remarked 
upon in [9] and summarised in 





Monads 


Comonads 


Initial Algebras 


^y. X + FY 


tiY. X X FY 


Final Coalgebras 


vY. X + FY 


vY. XxFY 



In [9], the three other higher order functors were remarked upon as follows: 

- The map sending a functor to F to the functor X i— > fjY.X + FY is the free monad 
construction 

- The map sending a functor to F to the functor X i— » vY.X + FY is the free completely 
iterative monad construction 

- The map sending a functor to F to the functor X vY.X x FY is the cofree comonad 
construction 

Higher dimensional tree functors provide — to our knowledge — the first naturally arising in- 
stance of the remaining quadrant of the table above. From [9], we have 

Theorem 3.2. For any functor F, the map X ^Y.X x FY is a comonad. 

At a practical level, this higher order functor translates into the following simple definition 
of higher dimensional trees in Haskell, the canonical recursion combinator arising from the 
initiality of higher dimensional trees and their comonadic structure. In the following. Maybe 
is Haskell implementation of the monad sending X tol + X. 



data 


Rose f a 


= Rose a (Maybe 


(f (Rose 


fa))) 


type 


Tree f a 


= Maybe (Rose f 


a) 




data 


RoseO a = 


RoseO a 


- = \X -> 


X 


type 


Rosel = 


Rose RoseO 


- = \X -> 


List"+ (X 


type 


Rose2 = 


Rose Rosel 


- = \X -> 


Rose (X) 


type 


RoseS = 


Rose Rose2 







cata : : Functor f => (a -> Maybe (f b) -> b) -> Rose f a -> b 



cata g (Rose x xs) = g x (fmap (fmap (cata g) ) xs) 

instance Functor RoseO where 
fmap f (RoseO a) = RoseO (f a) 

instance Functor f => Functor (Rose f) 

where fmap f = cata act where act a t = Rose (fa) t 

class Comonad f where 
root : : f a -> a 
comult : : f a -> f (fa) 

instance Comonad RoseO where 
root (RoseO x) = x 
comult (RoseO x) = RoseO (RoseO x) 

instance Functor f => Comonad (Rose f) where 
root (Rose x xs) = x 

comult (Rose x xs) = Rose (Rose x xs) (fmap (fmap comult) xs) 

As we have seen, higher dimensional trees are instances of canonical constructions which 
always produce comonads. It is also well-known that List"*" and List are monads. Less well 
known is that Rose is a monad. Clearly Rq is also a monad. Indeed we have 

Theorem 3.3. For all n>0,Rn is a monad. 

Space prevents us from detailing the proof of this theorem. However, it is important because 
it allows computation with higher dimensional trees to be further simplified via the use of 
the monadic notation available in Haskell to structure common patters of computation. For 
example, parsing and filtering become particularly simple. 

To suiimiarise, we depart from Rogers in not defining higher dimensional trees in terms of 
paths, but via the more abstract categorical notion of initial algebras. As a result, we take 
the functor Tn as primary as opposed to the set of tree domains which one may then label. 
This cleaner mathematical foundation reveals higher dimensional trees to be related to the 
fundamental constructions of the free monad, free completely iterative monad and cofree 
comonad. It also leads to a simple implementation of higher dimensional trees in Haskell. 

4 Containers 

Containers [8] are designed to represent those functors which are concrete data types and 
those natural transformations which are polymorphic functions between such concrete data 
types. Such data types include lists, trees etc, but not solutions of mixed variance recursive 
domain equations such as ^X.{X — > X) + N. Containers take as primitive the idea that 
concrete data types consist of its general form or shapes and, given such a shape, a set of 
positions where data can be stored. Since Rogers' n-dimensional trees certainly store data 
at the nodes of the n-dimensional tree, it is natural to ask whether these trees are indeed 
containers. In this section, we see that the functors T„ and Rn are indeed containers and point 
out the following theoretical and practical consequences: 



- Many properties of n-dimensional ti^ees can be deduced from the fact that they are con- 
tainers. As just one example, our transformation of a non-deterministic automata into a 
deterministic one requires n-dimensional trees to preserve weak puUbacks. This follows 
from the fact that n-dimensional trees are containers. 

- While we choose not to take paths and tree domains as primitive in our treatment of higher 
dimensional trees, paths are nevertheless important. We want a capability to compute with 
them but do not want the burden of verifying the (HDLS) and (HDA) properties. In par- 
ticular, we want a purely inductive definition of tree domains and paths and, remarkably, 
find that the shapes and positions of the container T„ provide that. 

Containers are semantically equivalent to normal functors and a special case of analytic func- 
tors. However, while containers talk about the different shapes a data structure can assume, 
analytic functors talk about the number of structures of a given size and hence there is no 
clear, simple and immediate cormection between tree domains and paths on the one hand and 
analytic functors on the other hand. Thus we use containers rather than analytic functors to 
represent higher dimensional trees. In the rest of this section, we introduce containers and 
recall some of the closure properties of containers. This proves sufficient to then show that 
all Rogers' trees are indeed containers. While the theory of containers can be developed in 
any locally cartesian closed category with T4^-types and disjoint coproducts, we restrict to the 
category of Set to keep things simple. 

The simplest example of a data type which can be represented by a container is that of Usts. 
Indeed, any element of the type List(X) of lists of X can be uniquely written as a natural 
number n given by the length of the list, together with a function {0, . . . ,n — 1} X which 
labels each position within the fist with an element from X. Thus 

List(X) = Y[{0...n-1} ^ X 

More generally, we consider data types given by i) shapes which describe the form of the data 
type; and ii) for each shape, s e 5, there is a set of positions P{s). Thus we define 

Definition 4.1 (Container). A container [S, P) consists of a set S and an S -indexed family 
P of sets, ie a function P : 5 — > Set. 

As suggested above, lists can be presented as a container with shapes N and positions defined 
by P{n) = {0, . . . , n — 1}. Similarly, any binary tree can be uniquely described by its un- 
derlying shape (which is obtained by deleting the data stored at the leaves) and a function 
mapping the positions in this shape to the data thus: 




The extension of a container is an endofunctor defined as follows: 

Definition 4.2 (Extension of a Container). Let {S, P) be a container Its extension, is the 
functor T/^S P^ defined by 

ses 



Thus, an element of T(5 is a pair (s, /) where s G 5 is a shape and / : P(s) ^ X is 

a labelUng of the positions of s with elements from X. The action of T(^s,p) on a morphism 
g : X ^ Y sends the element (s, /) to the element (s, g ■ f). li F is a functor that is the 
extension of a container, then the shapes of that container can simply be calculated as Fl 
— that is 5 = T(5 p)l. This corresponds to erasing the data in a data structure to reveal the 
underlying shape. Containers have many good properties, in particular, many constructions 
on functors speciaUse to containers. These closure properties are sunomarised below 

Theorem 4.3 (Closure properties of Containers [8]). The following are true 

- The identity functor is the extension of the container with one shape and one position. 

- The constantly A valued functor has shapes A and positions given by Pa = 0. 

- Let {Si, Pi) and (52, P2) be containers. Then the functor Tf^g^^p^-^ + T(^s-i,P2) 
tension T(^s,p) of the container {S, P) defined by 

S = Si + S2 P(inl(s)) =Pis P(inr(s)) = P2S 

- Let {Si, Pi) and {S2, P2) be containers. Then the functor T(^g^p^) x T^^g^p^) is the ex- 
tension Tg^p of the container {S, P) defined by 

S = SiXS2 P{si,S2)=PlSi+P2S2 

In order to show that containers are closed under fixed points, we need to introduce the notion 
of a n-ary container to represent n-ary functors. For the purposes of our work, we only need 
bifunctors and so we restrict ourselves to binary containers 

Definition 4.4 (Bi-Containers). A bi-container consists of two containers with the same un- 
derlying shape. That is a set S and a pair of functions Pi, P2 : >S — > Set. The extension of a 
binary container is a bifunctor given by 

T^s,PuP.)iX,Y) = ]J(Pi5 X) X {P2S -> Y) 

ses 

Given a bi-container {S, Pi, P2), the functor X 1-^ jjY.F{X, Y) is a container as demon- 
strated by the following theorem 

Tlieorem 4.5 (Fixed Points of Containers [8]). Let {S, Pi, P2) be a bi-container and let 
F{X,Y) = T(5 P2)(X, y) be its extension. Then the functor iJbY.F{X,Y) is a container 
with shapes given by 

S = f,Y.T^s,p,^{Y) 

and positions given by 

P{s,f) = Pis+ l[ P{fp) 

peP2s 

To understand this theorem, think of an element of iJ,Y.F{X, Y) as a tree with a top P- 
layer which stores elements from X at the X positions in this P-layer and further elements 
of fiY.F{X, Y) at the ^-positions of this P-layer. We know that the shapes of the func- 
tor ijY.F{X,Y) must be this functor at 1, ie iJ,Y.F{l,Y). More concretely, a shape for 
pY.F{X, Y) must thus be an P-shape for the top layer of a tree and, for each F-position 
of that shape, we must have a shape of fj.Y.F{X, Y) to represent the tree recursively stored at 



that position. As for the positions for storing data of type X in a tree with shape (s, /) where 
s G S and / : -Pjs ^ jjY.FlY, these should be either the positions for storing X-data in the 
top layer given by P\s or, for each position in p G P2S, a position in the subtree stored at that 
position. Since that subtree has shape fp, we end up with the formula above. 

Applying these closure properties, we derive the following 

Theorem 4.6. Rogers' n-dimensional non-empty tree functor Rn is the extension of a con- 
tainer That is, Rn = T^g+ where 

HY.I + RnY 
1 

1+ II 

As a corollary, r„ is also the extension of a container. That is T„ = T(s,^,p,^) where Sn = 
1 + S'^, P„(inl*) = 0 and P„(inrs) = Pn^- What is particularly nice about the container 
presentation of r„ is that the shapes Sn are in bijection with the tree domains while the paths 
in any tree domain are in bijection with the positions of the equivalent shape. Further, the 
paths are given by a purely inductive definition. 

5 Automata, (Co)algebraically 

We show that the classical automata-theoretic results about determinisation and minimisation 
extend to the higher-dimensional automata of Rogers. Using our reformulation of Rogers' 
structures in Section 3 and the container-technology of Section 4, these results become special 
cases of the classical results about automata as algebras for a functor, a theory initiated by 
Arbib and Manes [2-4J. We also extend Rogers' work by appropriate notions of signature 
and deterministic automata. 

We should like to point out that none of the constructions or proofs in this section requires the 
explicit manipulation of trees or tree domains. 

Before starting on the topic of the section, we review the situation for string and tree 
automata. Ignoring initial and accepting states, the situation is depicted in 



(strings) 


non-det 


det (trees) 


non-det 


det 


top-down 


Q V{A X Q) 


Q ^ Q"^ top-down 


Q ^ P{FQ) 




bottom-up 


AxQ^VQ 


Ax Q ^ Q bottom-up 


FQ^PQ 


FQ^Q 



For both string and tree automata, the relationship between non-deterministic top-down au- 
tomata (=coalgebras in the KleisU-category of V) and non-deterministic bottom-up automata 
(=algebras in the Kleisli-category of P) is straightforward: both Q P{FQ) and FQ 
PQ are just two different ways of denoting a relation C Q x FQ. The relationship between 
deterministic top-down automata (=coalgebras) and deterministic bottom-up automata (=al- 
gebras) is given in the string case by the adjunction Ax — H (— (this situation is generalised 



'-'n+l 

^++i(inr(s,/)) 



P+ 



and studied in [3]). In the tree case, is an arbitrary functor on Set and so has in general no 
right-adjoint.^ It is still possible to describe deterministic top-down tree automata but they are 
strictly less expressive [6, Chapter 1.6]. 

The familiar move from non-deterministic to deterministic string automata can be summarised 
as follows. Any non-deterministic transition structure / : Q ^ 'P{A x Q) can be lifted to a 
map f:PQ^ V{A x Q) given by = U,gs /(<?)■ Using P(A x g) ^ (VQ)^, / is 
a deterministic transition structure VQ {VQ)"^ on VQ. Determinisation for tree automata 
will be discussed below. 

5.1 Signatures 

Rogers' automata of Definition 2.5 do not associate arities to the symbols in the alphabet 
E. For example, in the tree automata of Example 2.6, one a may appear in two triples 
{q, a, li), {q, <t, I2) G 8 where li and I2 are lists of different lengths. Thus the same 'func- 
tion symbol' a may have different arities and the Z'-labelled trees are not exactly elements of 
a term algebra. 

To rectify this situation, we must ask ourselves what is the appropriate notion of arity if 
operations take as input higher dimensional trees. In the two-dimensional case arities are 
natural numbers: the arity of a function symbol a is the number of its arguments. But, in 
container terminology, N is just the the set of shapes of Ti = List. Thus, when operations of 
a signature are consuming higher dimensional trees, their arities should be the shapes of trees 
one dimension lower. This leads to 

Definition 5.1 ((n + 1) -dimensional signature). An {n + l)-dimensional signature is a set 
S with a map S ^^(l). 

Example 5.2. 1. A 1-dimensional signature is a map r : 17 — > {0, 1}, due to the isomor- 
phism To(l) = {0; 1}. We will see below (Example 5.4) that 0 specifies nuUary opera- 
tions and 1 specifies unary operations. 
2. A 2-dimensional signature is a signature in the usual sense, due to the isomorphism 
Ti (1) ^ N that maps a list to its length. 

The next step is to associate to each signature a functor in such a way that the initial algebra 
for the functor contains the elements of the language accepted by an automaton. The simplest 
and most elegant way to do this is to construct a container and use its extension. Recalling 
that Pn : Sn Set is the container whose extension is T„ and that 5„ = T„(l), we can turn 
any signature r : — > T„(l) into the container P^) as follows 

Pr-.S^Sn^ Set. (1) 

Deiinition 5.3 (Fs). The functor F(^jj ,^y or briefly Fs, associated to a signature r : E ^ 
T„(l) is the extension T(^E,Pr) of the container (1), thatis, Fx;{X) = Uaei: ^n{r{o')) X. 

Example 5.4. LA 1-dimensional signature r : S ^ {0,1} gives rise to the functor Fs {X) = 

So + Eix X where Si = r~'^{i). 
2. A 2-dimensional signature r : E —>■ N gives rise to the functor Fz:{X) = Uaei; X'^^'^^ 
usually associated with a signature. 



' In fact, if F : Set — » Set has a right-adjoint, then F = A x - for A = Fl. 



The next two propositions, which one might skip as a pedantic technical interlude, make the 
relation between an alphabet S' and a signature S ^n(l) precise. The first proposition 
says that trees for the signature E Tn{l) (ie elements of the initial Fx-algebra) are also 
trees over the alphabet S (ie elements of T„+i(i7)). The second proposition says that trees 
over the alphabet E' are the same as trees over the signature S' x T„(l) ^ T„(l). 

Proposition 5.5. For each (n + l)-dimensional signature S — > T„(l), there is a canonical 
Fjj-algebra structure on Tn+i{E). Moreover, the unique algebra morphism from the initial 
Fs-algebra to Tn+i{S) is injective. 

Proof. The carrier of the initial _F^-algebra is i.iY.Fs{Y) and i?„+i(r) is fiY.S x T„(Y). 
The injective morphism in question arises from the injective map of type Fs{Y) — > x 
T„(F), that is of type {Uaei: ^i^i^)) ^ Y) ^ S x (U,es„ -P(s) ^ which maps 
pairs {a, /) e Fj:{Y) to {a, (r(a), /)). 

Proposition 5.6. Let S' be a set (called an alphabet) and S be the signature given by the 
projection r : S' x T„(l) Tn{l). Then R„-^-l{E') is isomorphic to the (carrier of the) 
initial F^-algebra. 

Proof The carrier of the initial Fi;-algebrais nY.Fj:{Y) and Rn.+i{S') is jjY.E' x T„{Y). 

Butr'xr„(y) = i:'x(U,es„ - ^) = U(.,«)ei:'x5„ Pis) ^y = ]l.eEP{r{<y)) 

Y = FsiY). 

5.2 Higher Dimensional Automata 

Before giving a coalgebraic formulation of Rogers' automata (Definition 5.10), we introduce 
the corresponding notion of deterministic automaton (Definition 5.7), which has a particularly 
simple definition of accepted language and is used in the next subsection on determinisation 
and minimisation. (Recall Definition 5.3 of Fs-) 

Definition 5.7. A deterministic (n + l)-dimensional automaton for the signature 17 ^ T„(l) 
is a function 

FsQ Q. 

Example 5.8. 1. To obtain the usual string automata over an alphabet A we consider a 1- 
dimensional signature E consisting of the elements of A as unary operation symbols plus 
one additional nuUary operation symbol (see Example 5.4.1). F^iQ) is then 1 + A x Q. 
2. A 2-dimensional automaton is the usual deterministic bottom-up tree automaton [6]. 

Definition 5.9. A state q in a deterministic [n + l)-dimensional automaton for the signature 
S — > T„(l) accepts an (n + l)-dimensional tree t if the unique morphism from the initial 
Fs-algebra maps t to q. 

We adapt Rogers' definition of automata given in Definition 2.5: 

Definition 5.10. An (n + l)-dimensional automaton for the signature S — > T„(l) is a func- 
tion 

Q^r(Fj:(Q)). 

Example 5.11. 1. In the case of string automata, Fs{Q) is 1 + A x Q and an automaton 
becomes Q V{1 + AxQ) '^2x (VQ)^. The map Q ^2 encodes the accepting 
states and the map Q — > (VQ)^ gives the transition structure. 



2. Comparing with the previous definition, a 2-dimensional automaton 6 : Q ^ 'P{Fi:{Q)) 
can still be considered as a set of triples S C Q x {E x List(Q)), but not all such triples 
are allowed: for {q, a,{qi, . . . G 5 it has to be the the case that the arity of a is n. 
This coincides with the notion of a non-deterministic top-down tree automaton as in [6]. 

We have indicated how to define the accepted language of a (non-deterministic) automaton in 
Example 2.7. In particular, we found it natural to give a boltom-up formulation. We will now 
generalise this definition. The basic idea is as follows. We first observe that we cannot use 
the final coalgebra for the functor VF^ since this coalgebra would take the branching given 
by V into account. Instead, the correct idea is to consider a non-deterministic automaton as 
a -coalgebra in the category of relations. We first note the following proposition which 
follows from Fz: being the extension of a container. 

Proposition 5.12. Fs preserves weak pullbacks. 

Now let Rel denote the category of sets and relations. 

Definition 5.13. Given a functor F on Set we define F to map sets X to FX = FX and 
to map relations X ^ R ^ Y to FR = F{tto)°; Fini) where (— )° denotes relational 
converse and '; ' relational composition . 

Barr [5] showed that F is a functor on Rel if and only if F preserves weak pullbacks. A 
theorem of de Moor |7, Theorem 5] and Hasuo et al [10, Theorem 3.1] then guarantees that 
the initial F-algebra i : FI — > / in Set gives rise to the final F-coalgebra i° : I ^ FI 
in Rel. This gives a 'coinductive' definition of the accepted language of a non-deterministic 
automaton: 

Definition 5.14. The language accepted by a state q of an (n + l)-dimensional automa- 
ton Q V{Fs{Q)) is given by the unique arrow (in the category Rel) into the final F^- 
coalgebra . 

Note that this definition associates to q a subset of the carrier / of the initial Fi;-algebra. 

It is clear from the constructions that every deterministic automaton can be considered as a 
non-deterministic automaton, and that the two notions of accepted language agree. We make 
this precise with the following definition and proposition. 

Definition 5.15. The non-deterministic automaton corresponding to the deterministic au- 
tomaton f : FsQ Q is given by f° : Q ^ VF^Q (where f° is again the converse 
relation of (the graph of) f). 

Proposition 5.16. The deterministic automaton F^Q Q accepts t in q if and only if the 
corresponding non-deterministic automaton Q — > VF^Q has t in the language ofq. 

5.3 Determinisation and Minimisation 

This section follows the work by Arbib and Manes [2-4] on automata as algebras for a functor 
on a category. 

Determinisation First observe that the elementship relation 9C VX x X can be lifted to 
F{3) C FVX X FX, which can be written as 



FVX VFX 



(2) 



Tx is well-known to be natural in X whenever F preserves weak puUbacks. Now, given a 
non-deterministic automaton 

Q ^ ■PFj:Q (3) 



we first turn it from top-down to bottom-up by going to the converse relation 



(4) 



and then hft it from FsQ to VFeQ and precompose with r to obtain 



FeVQ VFeQ ^ VQ 



(5) 



Remark 5.17. The step from (4) to (5) is a special case of [4, Lemma 7 J (where V can be an 
arbitrary monad on a base category). 

Theorem 5.18. Given an (n + l)-dimensional automaton Q — > VFsQ (Definition 5.10) 
with accepting states Qq C Q, the state Qo in the corresponding deterministic automaton (5) 
accepts the same language. 

Mininiisation A deterministic automaton with a set of accepting states is a structure 



We denote by F^I I the initial F^-algebra and by p : / — > Q the unique morphism 
given by initiahty. The map /3 = a o pis called the behaviour of (6) because (}{t) tells us 
for any t G I whether it belongs to the accepted language or not. Note that the automata (6) 
form a category, denoted DAut, which has as morphism / : {6, a) — > {6', a') those algebra 
morphism f : 5 ^ S' satisfying a' o f = a. 

Definition 5.19 ([2, Section 4]). Let l : Fsl I be the initial Fs-algebra. The automaton 
(6) is reachable if the algebra morphism l ^ S is surjective and it is a reaUsation ofP : I ^ 2 
iff there is a morphism [l, (3) {S, a) in DAut. Moreover, (6) is a minimal realisation of fi 
iff for all reachable realisations {5' ,a') of (3 there is a unique surjective DAut-morphism 



Different minimal realisation theorems can be found in Arbib and Manes [2^] and Adamek 
and Trnkova [1]. The theorem below follows [1, V.1.3]. 

Theorem 5.20. Let S be an [n + l)-dimensional signature, Fs the corresponding functor 
and Fsl I the initial Fs -algebra. Then every map /?:/—> 2 has a minimal realisation. 

Proof. Let : (/,,/?) {5i,a.i) be the collection of all surjective DAut-morphisms with 
domain (i, Let /j be the multiple pushout of Cj in Set and g = fi o ei. The universal 
property gives us a with aog = (3. Being a container F^ is finitary and, therefore [1, V.1.5], 
preserves the multiple pushout. Hence there is 5 with SoFjjg = got. Since Fjj preserves, like 
any set-functor, surjective maps, 6 is uniquely determined. We have constructed an automaton 
{S, a) that realises /?. It is minimal because any other reachable realisation appears as one of 
the ei. 



Fj:Q^Q^2 



(6) 



f:{S',a')^{S,a). 



6 Conclusion 



This paper applies (co)algebraic and categorical techniques to Rogers' recent work in lin- 
guistics on higher dimensional trees. In particular, we have given an algebraic formulation 
of Rogers' higher dimensional trees and automata. Our analysis shows that, just as ordinary 
trees, the higher dimensional trees organise themselves in an initial algebra for a set-functor. 
This allowed us to use Arbib and Manes' theory of automata as algebras for a functor, yielding 
simple definitions of accepted language and straightforward constructions of determinisation 
and minimisation. 

More importantly, as we have only been able to hint at, our algebraic formulation gives us 
the possibility to write programs manipulating the trees in functional programming languages 
like Haskell that support polymorphic algebraic data types. Future work will be needed to 
substantiate our claim that, in fact, our abstract categorical treatment is very concrete in the 
sense that it will give rise to simple implementations of algorithms manipulation higher di- 
mensional trees. A good starting point could be Rogers' characterisation of non-strict tree 
adjoining grammars as 3-dimensional automata [11, Thm 5.2]. 
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